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In using digital speech for mobile radio, ice encounter the problem of 
severe bit-error bursts. Error clustering occurs because the bit duration is 
typically much smaller than that of a signal "fade," and average bit-error 
probabilities greater than 1 percent are not uncommon. For speech com- 
munication over such channels, this paper proposes variable step-size 
differential coders based on explicit (and error-protected) transmission of 
quantizer step size. Specifically, we discuss delta and DPCM coders to be 
referred to as DM-AQF and dpcm-aqf, where AQF sta?ids for adaptive 
quantization with forward estimation (and transmission) of step size. 
(Backward estimation, based on quantized-signal history, has the nice 
feature that the step-size information does not have to be explicitly trans- 
mitted. Furthermore, obtaining this information does not entail any en- 
coding delay. However, due to the dependence of step size on reconstructed 
signal history, backward estimation is often less reliable in the presence 
of bit errors than a scheme based on AQF.) The studies reported in this 
paper cover the problem of step-size determination in AQF, the design of 
time-invariant first-order predictors for DPCM- AQF, and the performances 
of AQF encoders with and without burst-error-protecting ploys such as 
redundant time-diversity coding and bit scrambling. Judging from SNR 
figures and informal listening tests, interesting results are obtained with 
the following 48-kb/s coders: three-bit DPCM-AQF with redundant error 
protection, and DM-AQF using bit scrambling. 

I. INTRODUCTION 

Recent developments in speech digitization 1 have prompted an 
examination of digital coding as a possibility for mobile radio telephony 
that conventionally employs analog techniques for speech transmis- 
sion. Conceivably, much of the signaling supervision and "book- 
keeping" in a mobile radio link can be digital; in this case, if the speech 
were handled digitally as well, it would be simple to interleave the 
voice bits with the control bits for transmission. Digital coding also 
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offers the possibilities of inexpensive coder-decoder implementation, 
straightforward speech encryption (by bit scrambling), and efficient 
signal regeneration. Perhaps the greatest incentive for the use of digital 
speech, however, is the thought that a properly designed digital code 
may be more resistant than analog systems to the multipath fading 
that characterizes mobile radio. 

Figure 1 shows the envelope of a Rayleigh-fading signal that is 
typical in mobile telephony. 2 An important parameter is the fading 
rate, which is approximately the ratio of vehicle speed V to the carrier 
wavelength X. For the example in Fig. 1, this ratio is about 15 Hz. 
Note also that the 5 m represent a total travel time of about 1 s at the 
indicated vehicle speed, and that the fading is slow or correlated in the 
sense that a given fade (signal strength below a specified threshold) 
can last for several tens of milliseconds (which will represent several 
hundred speech bits for the codes of this paper). The probability of 
a fade can be decreased by an order of magnitude by the use of diversity 
reception (two-branch, equal-gain or switched diversity, for example). 
But when a fade does occur, the signal is susceptible to noise capture 
as well as to co-channel interference. The end effects, with conventional 
analog transmissions, are impulsive "pops" and "crackles" in the 
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Fig. 1 — Envelope of a Rayleigh fading signal (V/X = 15.4 Hz). 
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received speech. Without explicit signal companding, these effects can 
be severe under "idle-channel" conditions, when no speech is present 
on the channel of interest. By contrast, if "adaptive" digital modula- 
tors are used to transmit speech, it is expected that the variable step- 
size mechanism in these modulators would inherently attenuate the 
impulsive interference signals (manifested as clustered bit errors in the 
digital system) during the silences of the incoming speech. This inter- 
ference-squelching property is, in fact, expected to carry over to the 
"active-channel" condition with an ideally designed adaptive code 
(that exploits the statistics of signal fading or, equivalently, the time 
statistics of the bit-error bursts). Even if such an ideal design should 
be impractical, it is clear that digital coding can straightforwardly 
employ efficient burst-error-protection ploys such as time-diversity 
coding and bit-scrambling. However, these refinements (as well as the 
notion of forward estimation of step size) can involve significant 
amounts of encoding delay. 

In our search for a suitable digital coder, we have used the following 
characteristics as guidelines. The speech bandwidth should be repre- 
sentative of standard telephone quality (200 to 3200 Hz) ; average bit- 
error rates higher than 1 percent are possible at times ; and, finally, the 
overall transmission rate should not exceed a nominal 48 kb/s. When 
we refer to a "nominal 48-kb/s rate," we mean that additional channel 
capacity (in the order of 2 to 5 kb/s) may be needed for the trans- 
mission of step-size information. 

A basic contention of this paper is that the "optimum" step size for 
a speech quantizer changes slowly enough with time for the step-size 
information to be transmitted reliably in a special error-protected 
format over a typical mobile radio channel. Thus, although the main 
stream of speech-carrying bits is still subject to errors, the provision of 
a relatively error-free step size will improve the received speech quality 
to a point that makes explicit step-size transmission worthwhile. We 
show that step-size transmitting coders are of interest for bursty as 
well as independent error patterns, and we include a comparison with 
a popular error-resistant syllabic-companded quantizer that recovers 
step size from the bit stream. Following Noll, 3 step-size transmitting 
adaptive coders will be labelled aqf (adaptive quantization with 
forward estimation and transmission of step size), in contradistinction 
to aqb (adaptive quantization with backward estimation). 

The coders of this paper are differential. We discuss both dpcm 
(differential pcm) and dm (delta modulation). It appears from ex- 
perience 1 that conventional time-invariant log-PCM quantization does 
not meet the error-performance requirements of mobile telephony. 
However, the possibility of a well-designed adaptive pcm 1 - 3 - 4 definitely 
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exists. A promising candidate is the technique of nearly instantaneous 
companding (nic). 4 

Although our studies have included informal perceptual assessments, 
most performance results in this paper are objective signal-to-noise 
ratios termed snrt and snrr. These reflect, respectively, the speech 
quality at the output of the local and remote decoders (t and r stand 
for "transmitter" and "receiver"). Formal definitions appear in Fig. 2. 
As we shall note at appropriate points in the paper, an SNRT-maximizing 
encoder does not, in general, maximize snrr, and vice versa. 

Our discussions refer to computer simulations that employed band- 
limited (200 to 3200 Hz) speech utterances (2 s or, sometimes, longer 
in duration) and bit-error patterns obtained from fading simulators. 5 
We believe that the main conclusions of this paper should hold for 
broad classes of speech and error patterns encountered in a mobile 
radio environment. However, our numerical results are often reflective 
of the specific data used in our computer simulation. To demonstrate 
real-world variabilities of these numerical results, we have employed 
variable speech data, whenever appropriate. 

Section II of this paper illustrates the time characteristics of the 
simulated burst error channel. Section III discusses the design of a 
dm-aqf coder. The section also demonstrates that simple bit-protecting 
codes are not particularly beneficial with dm-aqf (except for the trans- 
mission of step-size information). Bit scrambling, on the other hand, 
provides a definite advantage. Suitable sampling rates for dm-aqf are 
shown to be in the order of 30 to 40 kHz. Finally, a performance com- 
parison is made between dm-aqf and a representative dm-aqb code. 
Section IV describes the design of a dpcm-aqf coder, and demonstrates 
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Fig. 2 — Block diagram of codec simulation. 
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the utility of time-diversity coding for bit protection. It also indicates 
that a three-bit coder operating redundantly at a nominal 48 kb/s 
(and an 8-kHz sampling rate) is a better choice that a four-bit coder 
operating (also redundantly, but with less bit protection) at the same 
information rate. Section V provides a comparison of dpcm-aqf and 

DM-AQF. 

II. BIT-ERROR PATTERNS 
2.1 Burst errors 

Two simulated-error sequences were used in this study, representing 
average error probabilities of 0.025 and 0.055. These numbers represent 
channel qualities believed to be typically "much worse than average." 5 
The durations of the error sequences were long enough to simulate the 
transmission of all but the longest of the speech utterances being en- 
coded. For this utterance (which was 9 s long), the bit-error sequences 
were used repeatedly to cover the total speech duration. Simulated 
bit rates ranged from 24 to 48 kb/s. 

Figure 3 displays typical distribution functions for error-burst dura- 
tion D and the error-free interval 7. The numbers refer to a subsegment 
of the 0.025 error rate sequence. The error rate is denoted by P(EB), 
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Fig. 3— Time statistics of error bursts [P(E) = 0.025, V/X = 36.2 Hz]. 
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where B refers to burst errors. An error burst in Fig. 3 is denned to 
have a local error probability of 1. In other words, an error burst of 
length Do implies that D contiguous signal bits are in error. An isolated 
error, for example, is an error burst of length D = 1. The upper rows 
of Table I lists the average and median values of burst duration D 
and error-free interval I for the subsequence examined. The ratio of 
(Average) to (Z) nverttg o + /average) gives the average bit-error probability 
for the subsequence (1.6/51.6 = 0.03). The ratio of (D me diar.) to 
(Dmedian + /median), by contrast, is as high as 0.23. Of particular interest 
is the fact that /average » /median- This signifies the presence of some 
error-free intervals that are extremely long, together with a preponder- 
ance of intervals that are (unfortunately) very short (in fact, not much 
longer than Z> 0V erage)- The clustered nature of the errors is somewhat 
more apparent by comparison with average and median statistics that 
apply to an appropriate random error channel : that is, a channel where 
errors occur independently at every sample, but with an average error 
probability that is the same as that of the burst-error channel. The 
lower rows of Table I shows those statistics, as calculated for a random 
error channel whose bit-error probability is 0.03. Note that the value of 
-Daverage is much higher (for the same average error probability) in the 
case of the bursty channel, as expected. 

Burst-error patterns, including that of Table I, were obtained from 
a fading simulator. 6 The main components of the simulation were a 
pseudorandom binary input, an fm transmitter-receiver, a Rayleigh 
fader that took into account desired ratios of vehicle speed to carrier 
wavelength, a noise generator, a pseudorandomly modulated carrier, 
to approximate the effect of co-channel interference, and the option of 
switched-diversity reception. The numbers for the burst errors in 
Table I represent the impairment for a 24-kb/s signal-bit sequence 
(the bit duration determines the number of bits affected by a fade) 
when the mobile radio link is characterized by two-branch diversity 
reception under the following (worse than average) conditions : 

Signal-to-interference ratio = 9 dB 
Signal-to-noise ratio = « 
vehicle speed _ V _ 29 mi/h _ „„„„ qx 

carrier wavelength X 0.353 m 



Table- 1 — Average and median values 
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2.2 Scrambled errors 

Scrambled errors are of interest in sections of this paper that assume 
the scrambling of signal bits for error protection. The idea of bit 
scrambling is to expose adjacent coder bits to channel conditions that 
tend to be statistically independent. If the scrambling is pseudo- 
random, the receiver can put the received bits in proper sequence by 
an inverse unscrambling operation. To avoid the two operations of 
scrambling and unscrambling, the situation was simulated in our ex- 
periment by scrambling the known bit-error pattern and leaving the 
signal bits in their original sequence. 

Error sequences consisted of binary entries (error bits) E that were 
either or 1, and each entry of 1 represented a bit error in the decoding 
of a corresponding signal bit. The scrambling was accomplished as 
follows. The error-bit sequence E was handled in blocks that were M 
bits long, and each bit got a new position, given by a pseudo- 
random number (of bit intervals), as was derived from the current state 
of a maximal-length shift register with log 2 M stages. 6 The value of M 
was set at 1024, and the effect of scrambling is illustrated in Fig. 4, 
which is a snapshot of part of the 0.025 error-rate data. The three 
sections in the figure represent (contiguous) error sequences that are 
1024 bits long (512 per row, two rows per block). In each of the six 

JUUU*AAUAAJUUL^ , 
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2561 3072 

Fig. 4— Illustration of error scrambling [P{E) = 0.025, V/X = 36.2 Hz]. 
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rows, the lower sequence contains the burst errors (letter B), and the 
upper sequence has the scrambled errors (letter S). It is clear that a 
block length of M = 1024 is insufficient for true error randomization. 
Speech recordings have indicated, on the other hand, that values of M 
as small as 64 are sufficient to achieve useful speech encryption with 
signal-bit scrambling, assuming the 24-kb/s bit rate mentioned earlier. 

2.3 Transmission rate and average error probability P(E) 

Our simulations involved signal transmission rates of 24, 32, 40, and 
48 kb/s. It is reasonable, under assumptions of constant baud rate 
(number of channel symbols/second), to expect higher bit-rate trans- 
missions to be subject to correspondingly higher error rates. For 
example, if 24 and 48 kb/s represent two-phase and four-phase modula- 
tions of channel symbols, respectively, at a fixed 24-kHz symbol rate, 
the average error probability in the 48-kb/s system is expected to be 
typically two times* as large as that in the 24-kb/s scheme. 5 In the 
light of this, when we compare similar systems operating at significantly 
different bit rates in this paper (for example, 24 versus 48 kb/s) we 
assume average bit-error probabilities that are appropriately different 
(for example, 0.025 for 24-kb/s transmissions and 0.055 for 48-kb/s 
transmissions). Burst errors and scrambled errors are indicated by the 
notations eb and es. 

III. DM-AQF 

Figure 5 illustrates the principles of variable step size delta modula- 
tion with a forward control of step size. The buffer shown in the en- 
coder stores n input samples (typically, in linear pcm format) that 
are used to calculate the best step size A for the (future) delta modula- 
tion of the stored input block. The step size A is recomputed exactly 
once, and explicitly transmitted to the receiver, for every block of N samples. 
The rest of Fig. 5 merely represents a conventional linear delta modula- 
tor-demodulator pair. 1 The predictor is assumed to be time-invariant, 
and of first order. The equations describing the delta modulations are 
formally summarized below. 

b T = sgn (X r — hi-Zr-l). 

Z r = hi'Zr-l + A-br. (2) 

Z'r = hvZ' T -i+ A-&; 

The time indices r and r — 1 are not shown in the figure ; however, the 



' Strictly speaking, this number is a function of the carrier-to-noise ratio and the 
modem that is employed. For example, the number can exceed two (for a typical 
carrier-to-noise ratio) if fsk is used as the modulation system instead of psk (for 
transmitting the speech bits over the analog channel). 
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Fig. 5 — Block diagram of dm-aqf codec. 

implied one-sample delay occurs in the first-order predictor. Z and Z' 
are unfiltered staircase functions at the transmitter and receiver. The 
received bit b' differs from b if the error bit E is 1, and the step-size 
information A is assumed to be error-protected. For useful delta modu- 
lation, the sampling rate / should be much greater than the Nyquist 
frequency of the band-limited speech. 

3.1 Design of A, N, and h, 

So that the best step-size A may follow the statistics of the input 
speech, the following algorithms were examined. 



N 1 

A = Ki ■ zl \X r — X r -i • jf =- 

r = 2 N — I 



A = tf 2 -[Max IX, - Xr-i\l. 

2<r<N 



r N 



+ ~ K 3 -\ Z^Xr-Xr^Y^ 



M- 



(3a) 
(3b) 

(3c) 



Figure 6 plots the signal-to-noise-ratio snrt at the encoder as a func- 
tion of K n {n = 1, 2, 3) for the above algorithms. The numbers refer 
to 24-kHz delta modulation. Each scheme exhibits an optimal K n that 
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Fig. 6 — Step-size computation for dm-aqf. 

obviously represents the best mixture of slope-overload distortion 
(which predominates for K « K pt) and granular noise (which takes 
over for K » Kopt)- It is interesting that the maximum performances 
of the three algorithms are practically the same. This suggests that the 
step sizes resulting from these algorithms may not be significantly 
different, when optimal K values are employed. The rest of the paper 
will assume the use of the "average absolute slope" formula 



n-i [ k 



E \X r - Xr 



(4) 



Strictly speaking, this formula is optimal only for 24-kHz sampling 
and for perfect integrators (hi = 1). However, corrections for these 
factors were not found to be very significant for the values of / and 
hi used in our study, and formula (4) was therefore uniformly assumed 
for simplicity. (We may mention, however, that step-size dependencies 
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on hi and bit rate will be of interest in the design of dpcm-aqf en- 
coders; this is seen in Section IV.) 

The buffer length was set at JV = 256. This represents a compromise 
among three factors: (i) a need to minimize the encoding delay (this 
suggests smaller n values), (ii) a need to keep down the information 
rate in the step-size transmitting channel (this suggests slower up- 
dating, or larger N values), and (in) the need to track the changing 
statistics of speech with an appropriate speed. A buffer length of 5 to 
10 ms turns out to be a good choice for differential coding (this is 
demonstrated quantitatively in the context of dpcm-aqf) ; and N = 256 
does indeed correspond approximately to a 10-ms delay for/ = 24 kHz 
(and a 5-ms delay for/ = 48 kHz). 

The predictor coefficient was set to be hi = 0.9. This was nearly 
optimal from an snrt viewpoint for the sampling frequencies of interest. 
Over a noisy channel, if one uses snrr as a performance criterion, 
optimal values of hi tend to be smaller than 0.9. This is because 
"leakier" integrators mitigate error propagation in the output of a 
differential decoder. Once again, in the interest of simplicity, a quanti- 
tative consideration of this phenomenon has been deferred to the case 
of multibit dpcm coding (Section IV). 

3.2 Bit scrambling 

Table II demonstrates how bit scrambling can provide an snrr 
advantage in the presence of errors. As mentioned earlier, bit scram- 
bling was simulated by using scrambled errors es (in place of burst errors 
eb for an unscrambled bit stream). Informal listening tests indicate 
that the perceptual advantages of bit scrambling in dm-aqf are more 
significant than what the snrr gains in Table II may suggest. 

3.3 Error protection by redundant coding: EP-DM-AQF 

We studied a redundant dm-aqf coder in which every pair of ad- 
jacent dm bits was protected by the transmission of a (contiguous) 
parity check bit. When the parity failed at the receiver, a possible bit 
error was detected, and the received dm bit pair were forced to form 
an alternating (+ — or — +) sequence. This is equivalent to the 



Table II — Effect of bit scrambling in DM-AQF [P{EB) = P{ES) 
= 0.055 and entries are SNRR values in dB] 



/(kHz) 


Speech 


Burst Errors 


Scrambled Errors 


32 
40 


Male 
Female 


7.6 
7.8 


8.0 
8.8 
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Table II 


— Comparison of DM-AQF and EP-DM-AQF 


Scheme 


/(kHz) 


Transmission 
rate (kb/s) 


P(ES) 


SNRR (dB) 


EP-DM-AQF 
DM-AQF 
DM-AQF 


32 
32 
32 


48 
32 
32 


0.055 
0.055 
0.025 


8.0 

7.1 

10.0 



imposition of a zero-slope segment in the speech waveform when the 
receiver has no confidence in the incoming bits. Table III compares the 
performance of this error-protected system (ep-dm-aqf) with that of an 
unprotected dm-aqf coder, for the example of scrambled errors. The 
unprotected system has a bit rate of 32 kb/s, while the ep-dm-aqf 
operates at 32 X f = 48 kb/s. We are not concerned at this point with 
questions like a specific baud rate. However, in view of transmission 
rate versus error probability relations over real channels (Section II), 
the interesting comparison in Table III is between rows 1 and 3 
(rather than between 1 and 2). It appears that the simple parity- 
check-based error protection is not being useful; the advantages due 
to error detection at the receiver are being offset (or more than offset) 
by the increased error probability characteristic of the higher trans- 
mission rate in ep-dm-aqf. A similar result has been obtained in a 
simulation of dm-aqf with correlated errors, and also with dpcm en- 
coders where only the most significant bit is error-protected by the 
use of redundancy. 3 

3.4 Unprotected DM-AQF with bit scrambling; choice o1 f 

We have considered in some detail the specific case of unprotected 
(nonredundant) dm-aqf with bit scrambling. Table IV presents snrt 
and snrr values for such a system at different values of / and matched 
values of error probability, P(ES). Some entries in Table IV are inter- 
polated values because error sequences with the corresponding P(ES) 
values were not available. As suggested earlier in the example of binary 
versus quaternary psk, an obviously meaningful comparison is between 
rows 1 and 4 whose error ratios differ by a factor of two. 





Table IV — DM-AQF; Effect of / 




/(kHz) 


P(ES) 


SNRT (dB) 


SNRR (dB) 


24 
32 
40 
48 


0.023 
0.032 
0.040 
0.048 


17.1 
21.2 
23.7 
26.0 


8.6 

9.6 

10.5 

11.2 
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At the transmitter end, the quantization noise is easily perceived at 
/ = 24 kHz. It is barely apparent at / = 32 kHz, and a choice of 
/ = 40 kHz is likely to be more than adequate for many situations. 
Notice that, as / increases, so does the difference between snrt and 
snrr; and the quantization noise has a lesser and lesser influence on 
the speech quality at the receiver because of the relatively greater 
contributions of channel noise. 

3.5 A comparison with syllabic-companded DM-AQB 

To demonstrate that forward step-size coding is indeed desirable for 
the mobile radio channel, the dm-aqf scheme was compared with a 
syllabic-companded delta modulator with backward-step-size control 
(aqb). The step-size algorithm for the dm-aqb was 

A r = 0.966 -A r _! + 25 -[adapt]. 



[adapt],. = 1 if 



3 

8=0 



= 4, for p = or 1 or 2 



= otherwise. (5) 

The algorithm is reminiscent of, if not identical to, the digitally con- 
trolled delta modulation (dcdm) scheme due to Greefkes, 7 which is an 
aqb technique well known for its error resistance. Figure 7 demonstrates 
that, in the presence of bit errors, the performance of dm-aqf degrades 
more gracefully than that of the dm-aqb defined in (5). It must be 
remembered, of course, that the dm-aqb system is implemented more 
easily and without encoding delay. 8 

3.6 The problem of step-size transmission in AQF 

We have tacitly assumed so far that step-size information in dm-aqf 
can be very reliably transmitted, even over a fading channel, because 
step-size updating has to be done only infrequently. We shall now 
demonstrate this with some numbers. 

Figure 8 illustrates a histogram of step sizes that resulted from 
utilizing (4) for a 32-kHz dm-aqf encoder. It was noted that the en- 
coding was very tolerant to a maximum step-size constraint of 155, 
and a step-size resolution equal to 10; in other words, to a step-size 
dictionary of only 16 steps (5, 15, • • •, 155). In practice, the maximum- 
to-minimum step size ratio would probably be greater than 31, in 
anticipation of highly nonstationary speech inputs. 

The four-bit step-size information was transmitted as follows. At 
the beginning of each block of A r = 256 bits, the respective four-bit 
word was transmitted five consecutive times. Each bit in the step-size 
word was decoded on the basis of a majority count over the five 
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Fig. 7 — Comparison of dm-aqf and dm-aqb. 

received versions of the bit. The step-size transmissions increased the 
overall bit rate from 32 kb/s to 32[(256 + 4 X 5)/256] - 34.5 kb/s. 
For a random error rate of P(ES) = 0.025, the snrr with the explicit 
transmission of step size, as above, was nearly identical with the value 
obtained in a simulation that tacitly assumed the presence of correct 
step-size information at the receiver. The result is not surprising ; the 
probability of failure of a majority count of order 5 is given by 



P(M.C:5) = £ p'(l - p) 5 -* 

r = 3 

~ 10p 3 if p « 1, 



(!) 



(6) 

where p is the error probability. With p = P(ES) = 0.025, P(M.C; 5) 
= 1.64 X 10 -4 . The probability that at least one of the bits of a step- 
size word is wrongly decoded in our scheme is therefore no greater than 
6.4 in 10,000, and there were only 250 step-size transmissions during 
the entire length of the (2-s) speech utterance being coded. 

3.7 SNRT, SNRR, and P(E) as functions of time 

We conclude our discussion of dm-aqf with an interesting demonstra- 
tion of the time dependencies of snrt, snrr, and P(ES), as measured 
over blocks that were N = 256 samples long. The sampling rate was 
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Fig. 8 — Histogram of step sizes in dm-aqf (peak speech amplitude = 2048). 



24 kHz, the average error probability was 0.025 [refer to Fig. 2 and 
eq. (1) for error characterization], and the plots on Fig. 9 used numbers 
taken once every 20 blocks (5120 samples). Notice the obvious negative 
correlation between the time functions snrr[£] and P(ES)[_Q. The 
time variation of snrt is, of course, purely a reflection of the input 
speech material. 

IV. DPCM-AQF 

Figure 10 is a block diagram of differential pcm with forward step- 
size control. Differences from Fig. 5 consist in the use of a B-bit quan- 
tizer (B = 3 or 4 in this paper), and in the assumption of Nyquist-rate 
sampling, which obviates the need for a critical output filter. Basic 
dpcm notation is as follows : W is the normalized code word magnitude, 
e is the prediction error, and e is the quantized value of e. The time- 
invariant (first-order) predictor coefficient is hi, and r represents an 
instantaneous (sampled) value. The received bits b' q (q = 1, 2, • • •, B) 
are different from the transmitted bits b g if a corresponding error bit 
E equals 1. The step size is A; it is assumed to be recalculated once 
every N samples, and successfully error-protected in transmission. The 
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Fig. 9 — Time variation of snrt, snrr, and P(E) (V/X = 36.2 Hz). 

following are the salient dpcm equations. 

b gr = ±1; q = 1,2, ••-,£. 
bb qr = 0.56 9r + 0.5 = or 1. 
e r = X T — hi • F r _i. 
Y T = hi'Yr-l + Zr. 

Y'r = h v Y' T - x + e r . 
e T = TF r -A. 



(7) 



W r = \ E 2 B -'-66 gr j-sgnbxr. 



For any sample r, the sign of the code word W is the most significant 
bit 6i; the least significant bit is bs- 
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Fig. 10 — Block diagram of dpcm-aqf codec. 

4.1 Design of A, N, and hi 

aqf step sizes are derived (once for every block of N samples), using 
the formula 

1 N 



A = Ki 



A 1 r ? 2 



X r hi • X r —\ 



(8) 



Figures 11, 12, and 13 illustrate typical snrt and snrr dependencies on 
the parameters K i} N, and hi, respectively. The curves refer to the 
case of B = 4, P(EB) = 0.055, and to a redundant transmission tech- 
nique described in Fig. 14. It is clear that SNRT-maximizing designs are 
significantly different from the SNRR-maximizing values. Rather than 
getting bogged down in the controversial question of whether snrt 
or snrr is to be used as a performance criterion, we have elected, 
arbitrarily, to discuss the following SNRR-maximizing designs that were 
approximately good for the P(E) range of 0.025 to 0.055 : 



0) 



Notice that, in Fig. 12, snrr is maximum at N = 128. However, the 
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Fig. 11 — Step-size computation for dpcm-aqf. 

encoding delay is less objectionable (8 ms, instead of 16 ms) with 
N = 64. Note also that SNRT-maximizing designs call for higher values 
of both hi and K 4 . 

The maximum-to-minimum step-size ratio in the simulation was 
about 1000. It is possible to reduce this ratio to 100, and still provide 
useful coding of nonstationary speech. 1 Smaller step-size ratios enhance 
bit-error resistance. They also tend to simplify the problem of trans- 
mitting step-size information. 
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1024 



Fig. 12 — Step-size updating in dpcm-aqp. 

4.2 Error-protected DPCM-AQF 

Figure 14 illustrates the use of time-diversity coding designed to 
protect dpcm bits from burst errors. The time-diversity is provided by 
the delay P that will be discussed presently. Figure 14a defines a 
three-bit ep-dpcm system where the most significant bit &i is trans- 
mitted three times, and the second most significant bit 6 2 is sent twice. 
The least significant bit b 3 is transmitted only once. At the receiving 
end, the value of bi is determined on the basis of a majority count over 
the three received versions. In regard to the magnitude bit 6 2 , if the two 
versions of b 2 do not agree, the receiver code word is forced to its 
smallest magnitude (the polarity is still defined by the unequivocally 
decoded value of 6i). This is equivalent to forcing a "minimal-slope" 
segment in the decoded speech waveform when the receiver is in doubt 
about the code- word magnitude. Figure 14b defines a four-bit ep-dpcm 
system where only the most significant bit 6i is error-protected. Once 
again, the decoding of 6 X at the receiver follows a majority count over 
the three received versions thereof. Assuming 8-kHz sampling, both 
ep-dpcm systems of Fig. 14 would operate at 48 kb/s. However, the 
three-bit system of Fig. 14a has the benefit of greater error protection. 
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Fig. 13 — Design of predictor coefficient hi in dpcm-aqf. 



Figure 15 shows the benefits of time diversity for the example of 
the four-bit system of Fig. 14b. It is interesting that snrr is still tend- 
ing to increase at P values as large as 1024. It can be expected that, if 
P»[J) + /]average, successive repetitions of a given bit tend to be 
affected independently by the channel. D and I are the burst duration 
and spacing mentioned in Section II. The dpcm-aqf coders of this 
paper assume a uniform value of P = 768. For a bit rate of 48 kb/s, 
this implies a total encoding delay (from Fig. 13) of 2 P bits, or about 
32 ms. 
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Fig. 14 — Time-diversity coding for ep-dpcm. 
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4.3 EP-DPCM; choice of B 

We now compare the two 48-kb/s systems of Fig. 14. Table V shows, 
for two different speech inputs, the snrr values obtained with the 
three-bit and four-bit systems. The greater error protection in the 
three-bit system seems to make it more robust, in spite of the better 
quantization noise (snrt) performance of a four-bit coder, and the 
better receiver-end quality of three-bit coded speech is very obvious 
in listening tests. The result is also mentioned by Noll. 3 It is true that 
four-bit coding can provide a 6-dB superiority in snrt. It appears, on 
the other hand, that the subjective snrt in dpcm is known to be con- 
siderably higher than a measured objective snrt, 1 and the snrt of 
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Fig. 15 — Effect of time diversity on received-speech quality (V/X = 36.2 Hz). 
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Table V — EP-DPCM; Comparison of three- and four-bit systems 

(entries are SNRR values in dB; rows 1 and 2 

represent different speech inputs) 



P {KB) 


B = 3 


B = 4 


0.025 
0.055 


14.2 
9.7 


12.8 
9.3 



three-bit dpcm may prove to be subjectively adequate for some 
mobile links. In that case, the system of Fig. 14a would be a good 
configuration for error-protected dpcm. 

Table VI further demonstrates the benefits of error protection for 
B = 3. In view of the transmission rate— P(EB) relationships men- 
tioned in Section II, the interesting comparison in the table is between 
rows 1 and 3, not 1 and 2. It is seen that ep-dpcm at 48-kb/s provides 
a better snrr than unprotected 24-kb/s dpcm, in spite of the higher 
error probabilities that accompany the 48-kb/s transmissions. This 
contrasts interestingly with the results of Table II where error protec- 
tion was seen to be ineffective for dm-aqf. The suitability of error 
protection for dpcm (and not dm) seems to be a direct consequence of 
the multibit quantization in dpcm : it is possible to isolate and error- 
protect only the more significant dpcm bits and incur an overall re- 
dundancy of 50 to 100 percent; a majority count for a 24-kHz dm 
would immediately result in a transmission rate of 72 kb/s (and a 
redundancy of 200 percent). 

In a recently proposed, and not less effective, approach to ep-dpcm 
coding, 9 the dpcm bits are error-protected in suitably long blocks 
rather than on a bit-by-bit basis : The time diversity reception consists 
in selecting one of two time-separated blocks on the basis of an auto- 
correlation-type quality evaluation at the receiver. 

4.4 Bit scrambling in DPCM 

Informal listening tests, as well as snrr evaluations, have shown 
that bit scrambling, and the resulting error-randomization, is much 
less effective for multibit dpcm than for dm. The reason for this is not 

Table VI — Benefits of error protection for DPCM (B = 3) 



Code 


Transmission 
Rate (kb/s) 


P(EB) 


SNRT 


SNRR 


EP-DPCM 

Unprotected dpcm 
Unprotected dpcm 


48 
24 
24 


0.055 
0.055 
0.025 


19.4 

19.4 
19.4 


12.4 

7.1 
9.6 
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well understood. However, situations exist where bit scrambling can 
provide nominal SNRR gains even for dpc.m. These have been noted by 
Noll. 3 

4.5 DPCM-AQB 

The problem of step-size transmission for dpcm-aqf is expected to 
be handled through techniques not very different from those discussed 
in the context of dm-aqf. Following the calculation procedures of that 
section, it is estimated that virtually error-free transmission of dpcm 
step size would be possible (for the error rates considered) by the 
expenditure of about 5 kb/s of channel capacity.* To indicate the 
desirability of dedicating this kind of channel capacity for step size, 
we investigated two types of backward step-size control. One of these 
was adaptive differential quantization with a one-word memory. 1 Here, 
the quantized step size is modified for every sample by a factor de- 
termined solely by the magnitude of the latest code word W r . The other 
adaptive scheme derived step-size information by an algorithm similar 
to the dpcm-aqf rule (8) . The summation, however, was over the most 
recent n samples of quantized speech. Neither of the above backward 
schemes performed well enough with bit errors to merit inclusion of 
their results. It is conceivable, however, that, as in dm, some kind of a 
slowly adapting or syllabic dpcm may provide a fair result for mobile 
radio. It is also conjectured that the performance of such a scheme 
would be upper-bounded by that of dpcm-aqf in the manner of Fig. 7. 
At least one approach to slowly companded dpcm has been proposed 
to date. 10 - 11 

V. CONCLUSION 

The object of this paper was to specify two differential coders — one 
from the dm family and the other from the dpcm class — that would be 
appropriate for digitizing speech in some types of mobile radio sys- 
tems. The results of our work indeed suggest two such coders : a non- 
redundant 40-kHz dm-aqf coder with bit scrambling and an error- 
protected three-bit dpcm-aqf operating at a nominal 48 kb/s. The 
typical capabilities of these systems are summarized in Table VII, 
which is based on the example of a female utterance, "The lathe is a big 
tool." The transmission rates and error probabilities in Table VII are 
matched, albeit in a limited sense, as discussed earlier. Also, as em- 
phasized already, the error rates in Table VII are worse-than-average 
numbers for many mobile radio links. 



* If the overall transmission rate of the system is constrained to be 48 kb/s, it may 
be possible to work with a sampling rate of about 7 kHz, instead of 8 kHz, to accom- 
modate the step-size information in the 48-kb/s channel (Ref. 4). 
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Table VII - 


- Comparison of DM-AQF and EP-DPCM-AQF 


Coder 


B 


/(kHz) 


Trans- 
mission 
rate 
kb/s 


Esti- 
mate of 
5 (kb/s) 


Bit-Error 
Probability 


SNRT 

(dB) 


SNRR 

(dB) 


EP-DPCM-AQF 
DM-AQF 


3 
1 


8 
40 


48 + 5 
40 + 5 


5 
<5 


P(EB) = 0.055 
P(ES) = 0.045 


20.5 
23.7 


14.5 
10.1 



In assessing the coders of Table VII, it may be worth noting that the 
dm system is more flexible. For example, the dm sampling rate can be 
lowered to 32 kHz with only a 2.5-dB loss in maximum speech quality 
snrt (Table IV). Further, if the refinements of time-diversity coding 
(in dpcm) and bit scrambling (in dm) are eschewed, it is our experience 
that the dm system will lose less in the process. 

Obviously, a common denominator in the above systems is adaptive 
differential quantization. Crudely speaking, adaptive quantization 
serves to squelch channel noise, while differential coding tends to 
smear it ; and the combination appears to be perceptually very desirable 
in the context of mobile telephony. 

Formal perceptual studies in this subject should appropriately in- 
clude other digital techniques such as nondifferential (pcm) and back- 
ward-adaptive (aqb) coders. The studies should also include the pos- 
sible effects of encoding delay. Clearly, the amount of this delay de- 
pends on what combination of refinements (forward coding, bit 
scrambling, and time diversity) is employed ; and if the total delay gets 
to be long enough, the benefits of a better snrr (due to reliable step- 
size information, error randomization, and redundant error protection, 
respectively) may be accompanied by a loss of echo performance over 
certain kinds of networks. The best compromise between transmitted 
speech quality, received speech quality, and encoding delay is very 
likely to be system-specific; and the nature of this compromise may 
influence or define a selection among analog techniques, conventional 
digital schemes (aqb), and step-size transmitting codes (aqf). 
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